Hunting of the Snark: Finding Data Glitches using Data Mining Methods
نویسندگان
چکیده
Data quality is critical to data analysis because bad data can lead to incorrect conclusions. Problems with data are best detected early, before too much time and eeort are spent ingesting and analyzing it. In this paper, we propose the use of data mining techniques for the automatic detection of data problems commonly encountered in large multivariate data sets. Data mining methods are ideal for this purpose, since they are designed for nding abnormal patterns in large volumes of data. We discuss some important types of data integrity issues. We demonstrate the use of a data mining method, the DataSphere set comparison technique (from our earlier work 6]) to detect glitches that mimic the error conditions discussed, using artiicial data.
منابع مشابه
Hunting Data Glitches in Massive Time Series Data
In a previous paper [5] presented at IQ’99, we had proposed a method for isolating data glitches in massive data sets using a data mining method called DataSpheres. The technique runs in linear time, isolating sections of data that contain corrupted or abnormal data. In this paper, we propose using the DataSphere technique to isolate problems in time series data. We define two types of multivar...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کاملAutomated detection of coronavirus disease (COVID-19) by using data-mining techniques: a brief report
Background: The clinical field has vast sick data that has not been analyzed. Discovering a way to analyze this raw data and turn it into an information treasure can save many lives. Using data mining methods is an efficient way to analyze this large amount of raw data. It can predict the future with accurate knowledge of the past, providing new insights into disease diagnosis and prevention. S...
متن کاملPrediction of Student Learning Styles using Data Mining Techniques
This paper focuses on the prediction of student learning styles using data mining techniques within their institutions. This prediction was aimed at finding out how different learning styles are achieved within learning environments which are specifically influenced by already existing factors. These learning styles, have been affected by different factors that are mainly engraved and found wit...
متن کاملIdentification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms
In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999